Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
1.
BMC Genomics ; 24(1): 266, 2023 May 18.
Article in English | MEDLINE | ID: covidwho-2321452

ABSTRACT

BACKGROUND: The prevalence of the COVID-19 disease in recent years and its widespread impact on mortality, as well as various aspects of life around the world, has made it important to study this disease and its viral cause. However, very long sequences of this virus increase the processing time, complexity of calculation, and memory consumption required by the available tools to compare and analyze the sequences. RESULTS: We present a new encoding method, named PC-mer, based on the k-mer and physic-chemical properties of nucleotides. This method minimizes the size of encoded data by around 2 k times compared to the classical k-mer based profiling method. Moreover, using PC-mer, we designed two tools: 1) a machine-learning-based classification tool for coronavirus family members with the ability to recive input sequences from the NCBI database, and 2) an alignment-free computational comparison tool for calculating dissimilarity scores between coronaviruses at the genus and species levels. CONCLUSIONS: PC-mer achieves 100% accuracy despite the use of very simple classification algorithms based on Machine Learning. Assuming dynamic programming-based pairwise alignment as the ground truth approach, we achieved a degree of convergence of more than 98% for coronavirus genus-level sequences and 93% for SARS-CoV-2 sequences using PC-mer in the alignment-free classification method. This outperformance of PC-mer suggests that it can serve as a replacement for alignment-based approaches in certain sequence analysis applications that rely on similarity/dissimilarity scores, such as searching sequences, comparing sequences, and certain types of phylogenetic analysis methods that are based on sequence comparison.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , Phylogeny , Sequence Analysis, DNA , Nucleotides/genetics , Base Sequence , Algorithms
2.
Big Data Analytics in Chemoinformatics and Bioinformatics: with Applications to Computer-Aided Drug Design, Cancer Biology, Emerging Pathogens and Computational Toxicology ; : 359-390, 2022.
Article in English | Scopus | ID: covidwho-2280488

ABSTRACT

This chapter gives a detailed presentation of the theoretical background and computational approaches to the utility of alignment-free sequence descriptors and multidimensional variable reduction methods in the characterization and visualization of biological sequence data. The utility of such novel methods developed by the authors of this chapter is shown using data on case studies of severe acute respiratory syndrome, Middle East respiratory syndrome, Coronavirus disease-2019, and Zika viruses. © 2023 Elsevier Inc. All rights reserved.

3.
J Comput Biol ; 30(4): 469-491, 2023 04.
Article in English | MEDLINE | ID: covidwho-2255052

ABSTRACT

The massive amount of genomic data appearing for SARS-CoV-2 since the beginning of the COVID-19 pandemic has challenged traditional methods for studying its dynamics. As a result, new methods such as Pangolin, which can scale to the millions of samples of SARS-CoV-2 currently available, have appeared. Such a tool is tailored to take as input assembled, aligned, and curated full-length sequences, such as those found in the GISAID database. As high-throughput sequencing technologies continue to advance, such assembly, alignment, and curation may become a bottleneck, creating a need for methods that can process raw sequencing reads directly. In this article, we propose Reads2Vec, an alignment-free embedding approach that can generate a fixed-length feature vector representation directly from the raw sequencing reads without requiring assembly. Furthermore, since such an embedding is a numerical representation, it may be applied to highly optimized classification and clustering algorithms. Experiments on simulated data show that our proposed embedding obtains better classification results and better clustering properties contrary to existing alignment-free baselines. In a study on real data, we show that alignment-free embeddings have better clustering properties than the Pangolin tool and that the spike region of the SARS-CoV-2 genome heavily informs the alignment-free clusterings, which is consistent with current biological knowledge of SARS-CoV-2.


Subject(s)
COVID-19 , Pangolins , Humans , Animals , Pandemics , SARS-CoV-2/genetics , COVID-19/genetics , High-Throughput Nucleotide Sequencing/methods
4.
1st International Conference on Ambient Intelligence in Health Care, ICAIHC 2021 ; 317:459-468, 2023.
Article in English | Scopus | ID: covidwho-2173926

ABSTRACT

During the COVID-19 pandemic, several genetic mutations occurred in the SARS-CoV-2 virus, making more infectious or transmissible. The World Health Organization (WHO) tracks and classifies variants as variants of concern (VOCs) or variants of interest (VOIs), depending on the level of transmissibility and dominance of the variant in the regions. The classification and identification of variants usually occur through sequence alignment techniques, which are computationally complex, making them unfeasible to classify thousands of sequences simultaneously. In this work, an application of the alignment-free method BASiNETEntropy is proposed for the classification of the variants of concern of SARS-CoV-2. The method initially maps the biological sequences as a complex network. From this, the most informative edges are selected through the entropy maximization principle, getting a filtered network containing only the most informative edges. Thus, complex network topological measurements are extracted and used as features vectors in the classification process. Sequences of SARS-CoV-2 variants of concern extracted from NCBI were used to assess the method. Experimental results show that extracted features can classify the variants of concern with high assertiveness, considering few features, contributing to the reduction of the feature space. Besides classifying the variants of concern, unique patterns (motifs) were also extracted for each variant, relative to the SARS-CoV-2 reference sequence. The proposed method is implemented as an open source in R language and is freely available at https://cran.r-project.org/web/packages/BASiNETEntropy/. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

5.
11th International Conference on Computational Advances in Bio and Medical Sciences, ICCABS 2021 ; 13254 LNBI:133-148, 2022.
Article in English | Scopus | ID: covidwho-2148575

ABSTRACT

The massive amount of genomic data appearing over the past two years for SARS-CoV-2 has challenged traditional methods for studying the dynamics of the COVID-19 pandemic. As a result, new methods, such as the Pangolin tool, have appeared which can scale to the millions of samples of SARS-CoV-2 currently available. Such a tool is tailored to take assembled, aligned and curated full-length sequences, such as those provided by GISAID, as input. As high-throughput sequencing technologies continue to advance, such assembly, alignment and curation may become a bottleneck, creating a need for methods which can process raw sequencing reads directly. In this paper, we propose several alignment-free embedding approaches, which can generate a fixed-length feature vector representation directly from the raw sequencing reads, without the need for assembly. Moreover, because such an embedding is a numerical representation, it can be passed to already highly optimized clustering methods such as k-means. We show that the clusterings we obtain with the proposed embeddings are more suited to this setting than the Pangolin tool, based on several internal clustering evaluation metrics. Moreover, we show that a disproportionate number of positions in the spike region of the SARS-CoV-2 genome are informing such clusterings (in terms of information gain), which is consistent with current biological knowledge of SARS-CoV-2. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

6.
Springer Series in Reliability Engineering ; : 347-360, 2023.
Article in English | Scopus | ID: covidwho-1990573

ABSTRACT

The COrona VIrus Disease (COVID-19) pandemic led to the occurrence of several variants with time. This has led to an increased importance of understanding sequence data related to COVID-19. In this chapter, we propose an alignment-free k-mer based LSTM (Long Short-Term Memory) deep learning model that can classify 20 different variants of COVID-19. We handle the class imbalance problem by sampling a fixed number of sequences for each class label. We handle the vanishing gradient problem in LSTMs arising from long sequences by dividing the sequence into fixed lengths and obtaining results on individual runs. Our results show that one-vs-all classifiers have test accuracies as high as 92.5% with tuned hyperparameters compared to the multi-class classifier model. Our experiments show higher overall accuracies for B.1.1.214, B.1.177.21, B.1.1.7, B.1.526, and P.1 on the one-vs-all classifiers, suggesting the presence of distinct mutations in these variants. Our results show that embedding vector size and batch sizes have insignificant improvement in accuracies, but changing from 2-mers to 3-mers mostly improves accuracies. We also studied individual runs which show that most accuracies improved after the 20th run, indicating that these sequence positions may have more contributions to distinguishing among different COVID-19 variants. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

7.
Commun Integr Biol ; 15(1): 96-104, 2022.
Article in English | MEDLINE | ID: covidwho-1784243

ABSTRACT

SARS-CoV-2 is suspected to be the product of a natural or artificial recombination of two viruses - one adapted to the horseshoe bat and the other, donor of the spike protein gene, adapted to an unknown species. Here we used a new method to search for the original host of the ancestor of the SARS-CoV-2 virus and for the donor of its gene for the spike protein, the molecule responsible for binding to and entering human cells. We computed immunological T-distances (the number of different peptides that are present in the viral proteins but absent in proteins of the host) between 11 species of coronaviruses and 38 representatives of the main mammal clades. Analyses of pentapeptides, the presumed principal targets of T-cell non-self recognition, showed the smallest T-distance of the spike protein of SARS-CoV-2 to humans, while the rest of SARS-CoV-2 proteome to the horseshoe bat. This suggests that the ancestor of SARS-CoV-2 was adapted to bats, but the spike gene donor was adapted to humans. Further analyses suggest that the ancestral coronavirus adapted to bats was shortly passaged in treeshrews, while the donor of the spike gene was shortly passaged in rats before the recombination event.

8.
J Comput Biol ; 29(5): 453-464, 2022 05.
Article in English | MEDLINE | ID: covidwho-1758600

ABSTRACT

In this work, we investigate using Fourier coefficients (FCs) for capturing useful information about viral sequences in a computationally efficient and compact manner. Specifically, we extract geographic submission location from SARS-CoV-2 sequence headers submitted to the GISAID Initiative, calculate corresponding FCs, and use the FCs to classify these sequences according to geographic location. We show that the FCs serve as useful numerical summaries for sequences that allow manipulation, identification, and differentiation via classical mathematical and statistical methods that are not readily applicable for character strings. Further, we argue that subsets of the FCs may be usable for the same purposes, which results in a reduction in storage requirements. We conclude by offering extensions of the research and potential future directions for subsequent analyses, such as the use of other series transforms for discreetly indexed signals such as genomes.


Subject(s)
COVID-19 , SARS-CoV-2 , Benchmarking , Genome, Viral , Humans , Phylogeny , SARS-CoV-2/genetics
9.
Curr Genomics ; 22(8): 583-595, 2021 Dec 31.
Article in English | MEDLINE | ID: covidwho-1699391

ABSTRACT

Background: A newly emerging novel coronavirus appeared and rapidly spread worldwide and World Health Organization declared a pandemic on March 11, 2020. The roles and characteristics of coronavirus have captured much attention due to its power of causing a wide variety of infectious diseases, from mild to severe, on humans. The detection of the lethality of human coronavirus is key to estimate the viral toxicity and provide perspectives for treatment. Methods: We developed an alignment-free framework that utilizes machine learning approaches for an ultra-fast and highly accurate prediction of the lethality of human-adapted coronavirus using genomic sequences. We performed extensive experiments through six different feature transformation and machine learning algorithms combining digital signal processing to identify the lethality of possible future novel coronaviruses using existing strains. Results: The results tested on SARS-CoV, MERS-CoV and SARS-CoV-2 datasets show an average 96.7% prediction accuracy. We also provide preliminary analysis validating the effectiveness of our models through other human coronaviruses. Our framework achieves high levels of prediction performance that is alignment-free and based on RNA sequences alone without genome annotations and specialized biological knowledge. Conclusion: The results demonstrate that, for any novel human coronavirus strains, this study can offer a reliable real-time estimation for its viral lethality.

10.
IEEE Access ; 8: 195263-195273, 2020.
Article in English | MEDLINE | ID: covidwho-1604602

ABSTRACT

The world is grappling with the COVID-19 pandemic caused by the 2019 novel SARS-CoV-2. To better understand this novel virus and its relationship with other pathogens, new methods for analyzing the genome are required. In this study, intrinsic dinucleotide genomic signatures were analyzed for whole genome sequence data of eight pathogenic species, including SARS-CoV-2. The genome sequences were transformed into dinucleotide relative frequencies and classified using the extreme gradient boosting (XGBoost) model. The classification models were trained to a) distinguish between the sequences of all eight species and b) distinguish between sequences of SARS-CoV-2 that originate from different geographic regions. Our method attained 100% in all performance metrics and for all tasks in the eight-species classification problem. Moreover, the models achieved 67% balanced accuracy for the task of classifying the SARS-CoV-2 sequences into the six continental regions and achieved 86% balanced accuracy for the task of classifying SARS-CoV-2 samples as either originating from Asia or not. Analysis of the dinucleotide genomic profiles of the eight species revealed a similarity between the SARS-CoV-2 and MERS-CoV viral sequences. Further analysis of SARS-CoV-2 viral sequences from the six continents revealed that samples from Oceania had the highest frequency of TT dinucleotides as well as the lowest CG frequency compared to the other continents. The dinucleotide signatures of AC, AG,CA, CT, GA, GT, TC, and TG were well conserved across most genomes, while the frequencies of other dinucleotide signatures varied considerably. Altogether, the results from this study demonstrate the utility of dinucleotide relative frequencies for discriminating and identifying similar species.

11.
Infect Genet Evol ; 96: 105106, 2021 12.
Article in English | MEDLINE | ID: covidwho-1506080

ABSTRACT

Coronaviruses (especially SARS-CoV-2) are characterized by rapid mutation and wide spread. As these characteristics easily lead to global pandemics, studying the evolutionary relationship between viruses is essential for clinical diagnosis. DNA sequencing has played an important role in evolutionary analysis. Recent alignment-free methods can overcome the problems of traditional alignment-based methods, which consume both time and space. This paper proposes a novel alignment-free method called the correlation coefficient feature vector (CCFV), which defines a correlation measure of the L-step delay of a nucleotide location from its location in the original DNA sequence. The numerical feature is a 16×L-dimensional numerical vector describing the distribution characteristics of the nucleotide positions in a DNA sequence. The proposed L-step delay correlation measure is interestingly related to some types of L+1 spaced mers. Unlike traditional gene comparison, our method avoids the computational complexity of multiple sequence alignment, and hence improves the speed of sequence comparison. Our method is applied to evolutionary analysis of the common human viruses including SARS-CoV-2, Dengue virus, Hepatitis B virus, and human rhinovirus and achieves the same or even better results than alignment-based methods. Especially for SARS-CoV-2, our method also confirms that bats are potential intermediate hosts of SARS-CoV-2.


Subject(s)
Genome, Viral/genetics , Phylogeny , Sequence Analysis, DNA/methods , Coronavirus/genetics , Dengue Virus/genetics , Hepatitis B/genetics , Humans , Models, Genetic , Rhinovirus/genetics , SARS-CoV-2/genetics , Sequence Alignment
12.
Biology (Basel) ; 10(9)2021 Aug 31.
Article in English | MEDLINE | ID: covidwho-1438503

ABSTRACT

The study of viral diversity is imperative in understanding sequence change and its implications for intervention strategies. The widely used alignment-dependent approaches to study viral diversity are limited in their utility as sequence dissimilarity increases, particularly when expanded to the genus or higher ranks of viral species lineage. Herein, we present an alignment-independent algorithm, implemented as a tool, UNIQmin, to determine the effective viral sequence diversity at any rank of the viral taxonomy lineage. This is done by performing an exhaustive search to generate the minimal set of sequences for a given viral non-redundant sequence dataset. The minimal set is comprised of the smallest possible number of unique sequences required to capture the diversity inherent in the complete set of overlapping k-mers encoded by all the unique sequences in the given dataset. Such dataset compression is possible through the removal of unique sequences, whose entire repertoire of overlapping k-mers can be represented by other sequences, thus rendering them redundant to the collective pool of sequence diversity. A significant reduction, namely ~44%, ~45%, and ~53%, was observed for all reported unique sequences of species Dengue virus, genus Flavivirus, and family Flaviviridae, respectively, while still capturing the entire repertoire of nonamer (9-mer) viral peptidome diversity present in the initial input dataset. The algorithm is scalable for big data as it was applied to ~2.2 million non-redundant sequences of all reported viruses. UNIQmin is open source and publicly available on GitHub. The concept of a minimal set is generic and, thus, potentially applicable to other pathogenic microorganisms of non-viral origin, such as bacteria.

13.
Bioinform Biol Insights ; 15: 11779322211020316, 2021.
Article in English | MEDLINE | ID: covidwho-1367655

ABSTRACT

MOTIVATION: There is a need for rapid and easy-to-use, alignment-free methods to cluster large groups of protein sequence data. Commonly used phylogenetic trees based on alignments can be used to visualize only a limited number of protein sequences. DGraph, introduced here, is an application developed to generate 2-dimensional (2D) maps based on similarity scores for sequences. The program automatically calculates and graphically displays property distance (PD) scores based on physico-chemical property (PCP) similarities from an unaligned list of FASTA files. Such "PD-graphs" show the interrelatedness of the sequences, whereby clusters can reveal deeper connectivities. RESULTS: Property distance graphs generated for flavivirus (FV), enterovirus (EV), and coronavirus (CoV) sequences from complete polyproteins or individual proteins are consistent with biological data on vector types, hosts, cellular receptors, and disease phenotypes. Property distance graphs separate the tick- from the mosquito-borne FV, cluster viruses that infect bats, camels, seabirds, and humans separately. The clusters correlate with disease phenotype. The PD method segregates the ß-CoV spike proteins of severe acute respiratory syndrome (SARS), severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and Middle East respiratory syndrome (MERS) sequences from other human pathogenic CoV, with clustering consistent with cellular receptor usage. The graphs also suggest evolutionary relationships that may be difficult to determine with conventional bootstrapping methods that require postulating an ancestral sequence.

14.
Int J Pept Res Ther ; 27(4): 2257-2273, 2021.
Article in English | MEDLINE | ID: covidwho-1316307

ABSTRACT

The design for vaccines using in silico analysis of genomic data of different viruses has taken many different paths, but lack of any precise computational approach has constrained them to alignment methods and some alignment-free techniques. In this work, a precise computational approach has been established wherein two new mathematical parameters have been suggested to identify the highly conserved and surface-exposed regions which are spread over a large region of the surface protein of the virus so that one can determine possible peptide vaccine candidates from those regions. The first parameter, w, is the sum of the normalized values of the measure of surface accessibility and the normalized measure of conservativeness, and the second parameter is the area of a triangle formed by a mathematical model named 2D Polygon Representation. This method has been, therefore, used to determine possible vaccine targets against SARS-CoV-2 by considering its surface-situated spike glycoprotein. The results of this model have been verified by a parallel analysis using the older approach of manually estimating the graphs describing the variation of conservativeness and surface-exposure across the protein sequence. Furthermore, the working of the method has been tested by applying it to find out peptide vaccine candidates for Zika and Hendra viruses respectively. A satisfactory consistency of the model results with pre-established results for both the test cases shows that this in silico alignment-free analysis proposed by the model is suitable not only to determine vaccine targets against SARS-CoV-2 but also ready to extend against other viruses.

15.
Comput Biol Chem ; 92: 107479, 2021 Jun.
Article in English | MEDLINE | ID: covidwho-1216310

ABSTRACT

Development of protein 3-D structural comparison methods is essential for understanding protein functions. Some amino acids share structural similarities while others vary considerably. These structures determine the chemical and physical properties of amino acids. Grouping amino acids with similar structures potentially improves the ability to identify structurally conserved regions and increases the global structural similarity between proteins. We systematically studied the effects of amino acid grouping on the numbers of Specific/specific, Common/common, and statistically different keys to achieve a better understanding of protein structure relations. Common keys represent substructures found in all types of proteins and Specific keys represent substructures exclusively belonging to a certain type of proteins in a data set. Our results show that applying amino acid grouping to the Triangular Spatial Relationship (TSR)-based method, while computing structural similarity among proteins, improves the accuracy of protein clustering in certain cases. In addition, applying amino acid grouping facilitates the process of identification or discovery of conserved structural motifs. The results from the principal component analysis (PCA) demonstrate that applying amino acid grouping captures slightly more structural variation than when amino acid grouping is not used, indicating that amino acid grouping reduces structure diversity as predicted. The TSR-based method uniquely identifies and discovers binding sites for drugs or interacting proteins. The binding sites of nsp16 of SARS-CoV-2, SARS-CoV and MERS-CoV that we have defined will aid future antiviral drug design for improving therapeutic outcome. This approach for incorporating the amino acid grouping feature into our structural comparison method is promising and provides a deeper insight into understanding of structural relations of proteins.


Subject(s)
Computer Simulation , Models, Chemical , SARS-CoV-2 , Viral Proteins/chemistry , Amino Acid Sequence , Antiviral Agents/chemistry , Binding Sites , Cluster Analysis , Imaging, Three-Dimensional , Models, Molecular , Protein Binding , Protein Conformation , COVID-19 Drug Treatment
16.
Curr Comput Aided Drug Des ; 17(7): 936-945, 2021.
Article in English | MEDLINE | ID: covidwho-1061201

ABSTRACT

INTRODUCTION: Coronaviruses comprise a group of enveloped, positive-sense single-stranded RNA viruses that infect humans as well as a wide range of animals. The study was performed on a set of 573 sequences belonging to SARS, MERS and SARS-CoV-2 (CoVID-19) viruses. The sequences were represented with alignment-free sequence descriptors and analyzed with different chemometric methods: Euclidean/Mahalanobis distances, principal component analysis and self-organizing maps (Kohonen networks). We report the cluster structures of the data. The sequences are well-clustered regarding the type of virus; however, some of them show the tendency to belong to more than one virus type. BACKGROUND: This is a study of 573 genome sequences belonging to SARS, MERS and SARS-- CoV-2 (CoVID-19) coronaviruses. OBJECTIVES: The aim was to compare the virus sequences, which originate from different places around the world. METHODS: The study used alignment free sequence descriptors for the representation of sequences and chemometric methods for analyzing clusters. RESULTS: Majority of genome sequences are clustered with respect to the virus type, but some of them are outliers. CONCLUSION: We indicate 71 sequences, which tend to belong to more than one cluster.


Subject(s)
COVID-19 , SARS-CoV-2 , Animals , Cluster Analysis , Humans
17.
Comput Biol Med ; 131: 104247, 2021 04.
Article in English | MEDLINE | ID: covidwho-1056506

ABSTRACT

A non-standard bioinformatics method, 4D-Dynamic Representation of DNA/RNA Sequences, aiming at an analysis of the information available in nucleotide databases, has been formulated. The sequences are represented by sets of "material points" in a 4D space - 4D-dynamic graphs. The graphs representing the sequences are treated as "rigid bodies" and characterized by values analogous to the ones used in the classical dynamics. As the graphical representations of the sequences, the projections of the graphs into 2D and 3D spaces are used. The method has been applied to an analysis of the complete genome sequences of the 2019 novel coronavirus. As a result, 2D and 3D classification maps are obtained. The coordinate axes in the maps correspond to the values derived from the exact formulas characterizing the graphs: the coordinates of the centers of mass and the 4D moments of inertia. The points in the maps represent sequences and their coordinates are used as the classifiers. The main result of this work has been derived from the 3D classification maps. The distribution of clusters of points which emerged in these maps, supports the hypothesis that SARS-CoV-2 may have originated in bat and in pangolin. Pilot calculations for Zika virus sequence data prove that the proposed approach is also applicable to a description of time evolution of genome sequences of viruses.


Subject(s)
Algorithms , Base Sequence , COVID-19/genetics , Computational Biology , Genome, Viral , SARS-CoV-2/genetics , Animals , Chiroptera/virology , Humans , Pangolins/virology , Phylogeny , Zika Virus/genetics , Zika Virus Infection/genetics
18.
Infect Genet Evol ; 88: 104708, 2021 03.
Article in English | MEDLINE | ID: covidwho-1039486

ABSTRACT

The pandemic due to novel coronavirus, SARS-CoV-2 is a serious global concern now. More than thousand new COVID-19 infections are getting reported daily for this virus across the globe. Thus, the medical research communities are trying to find the remedy to restrict the spreading of this virus, while the vaccine development work is still under research in parallel. In such critical situation, not only the medical research community, but also the scientists in different fields like microbiology, pharmacy, bioinformatics and data science are also sharing effort to accelerate the process of vaccine development, virus prediction, forecasting the transmissible probability and reproduction cases of virus for social awareness. With the similar context, in this article, we have studied sequence variability of the virus primarily focusing on three aspects: (a) sequence variability among SARS-CoV-1, MERS-CoV and SARS-CoV-2 in human host, which are in the same coronavirus family, (b) sequence variability of SARS-CoV-2 in human host for 54 different countries and (c) sequence variability between coronavirus family and country specific SARS-CoV-2 sequences in human host. For this purpose, as a case study, we have performed topological analysis of 2391 global genomic sequences of SARS-CoV-2 in association with SARS-CoV-1 and MERS-CoV using an integrated semi-alignment based computational technique. The results of the semi-alignment based technique are experimentally and statistically found similar to alignment based technique and computationally faster. Moreover, the outcome of this analysis can help to identify the nations with homogeneous SARS-CoV-2 sequences, so that same vaccine can be applied to their heterogeneous human population.


Subject(s)
COVID-19/epidemiology , Coronavirus Infections/epidemiology , Genetic Variation , Genome, Viral , Pandemics , SARS-CoV-2/genetics , Severe Acute Respiratory Syndrome/epidemiology , Africa/epidemiology , Americas/epidemiology , Asia/epidemiology , Australia/epidemiology , Base Sequence , COVID-19/transmission , COVID-19/virology , Computational Biology/methods , Coronavirus Infections/transmission , Coronavirus Infections/virology , Europe/epidemiology , Host-Pathogen Interactions/genetics , Humans , Middle East Respiratory Syndrome Coronavirus/genetics , Middle East Respiratory Syndrome Coronavirus/pathogenicity , Severe acute respiratory syndrome-related coronavirus/genetics , Severe acute respiratory syndrome-related coronavirus/pathogenicity , SARS-CoV-2/pathogenicity , Sequence Alignment , Severe Acute Respiratory Syndrome/transmission , Severe Acute Respiratory Syndrome/virology
19.
Transbound Emerg Dis ; 67(4): 1453-1462, 2020 Jul.
Article in English | MEDLINE | ID: covidwho-71844

ABSTRACT

Pre-clinical responses to fast-moving infectious disease outbreaks heavily depend on choosing the best isolates for animal models that inform diagnostics, vaccines and treatments. Current approaches are driven by practical considerations (e.g. first available virus isolate) rather than a detailed analysis of the characteristics of the virus strain chosen, which can lead to animal models that are not representative of the circulating or emerging clusters. Here, we suggest a combination of epidemiological, experimental and bioinformatic considerations when choosing virus strains for animal model generation. We discuss the currently chosen SARS-CoV-2 strains for international coronavirus disease (COVID-19) models in the context of their phylogeny as well as in a novel alignment-free bioinformatic approach. Unlike phylogenetic trees, which focus on individual shared mutations, this new approach assesses genome-wide co-developing functionalities and hence offers a more fluid view of the 'cloud of variances' that RNA viruses are prone to accumulate. This joint approach concludes that while the current animal models cover the existing viral strains adequately, there is substantial evolutionary activity that is likely not considered by the current models. Based on insights from the non-discrete alignment-free approach and experimental observations, we suggest isolates for future animal models.


Subject(s)
Computational Biology , Coronavirus Infections/epidemiology , Disease Outbreaks , Genomics , Pandemics/prevention & control , Pneumonia, Viral/epidemiology , Animals , Betacoronavirus/genetics , Biological Evolution , COVID-19 , Disease Models, Animal , Humans , Phylogeny , SARS-CoV-2
SELECTION OF CITATIONS
SEARCH DETAIL